Download the zipfile from https://github.com/dlab-geo/RGeocoding
unzip the files
start RStudio
start a new R script file
set your working directory to the location of the unzipped files
About me
About you - why are you here?
ggmapESRI World Geocoding ServiceObtain geographic coordinates for a named place, street address, or zip code.
Display locations on a map
Link locations to other data
File of Input locations
A Geocoder
Output file
What types of data are available/ should be used for matching
Where do you think the point for 275 5th St NE would be located?
Where do you think the point for 275 5th St NE would be located?
Addresses would match to centroid of parcel or rooftop
Try entering a city, landmark (Sather Gate), zipcode, address, etc.
Needed when
You can use ESRI ArcGIS/ArcPro with local reference database
- you need to install local reference database
- limited to USA & Canada
- needs to be updated manually
Available in the D-Lab
Geocode the following address in all three tools:
2625 Dana St, Berkeley, CA, 94704
ggmapThis has been the most common R package for geocoding.
But, beginning July 2018 Google changed its pricing model.
For API access you need to register a credit card with Google
$200 worth of free API services per month
That’s 40,000 geocoded locations
See the Google Geocoding API Documentation for details
ggmapWorth the trouble because output is great quality.
But only if you can afford the costs!
Let’s try it.
ggmapLoad the library
library(ggmap)
## Loading required package: ggplot2
## Google Maps API Terms of Service: http://developers.google.com/maps/terms.
## Please cite ggmap if you use it: see citation("ggmap") for details.
ggmapRegister your API key
#mykey <- "AIzaSyxxxxxxxxxxxxxxxxxxxxxxxxxOQyOFWrTw"
mykey <- "AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw"
register_google(key=mykey)
geocode("San Francisco, CA")
geocode("California")
geocode("Golden Gate Bridge")
geocode("Golden Gate Bridge", output="more")
revgeocode(c(-122.4194,37.77493), output="more")
geocode_ouput <-geocode("2625 Dana St, Berkeley CA, 94704", output="more")
Geocode a set of places at once.
Usually means you upload the entire file to the geocoder.
This is much faster but not supported by most online services
With ggmap::geocode we serial process the set in one command.
# File of addresses
address_data <- read.csv("address_data/oak_liquor_stores.csv", stringsAsFactors = F)
# Take a look
head(address_data)
## id name street city state zip type
## 1 1 Wah Fay Liquors 2101 8th Ave Oakland CA 94606 p
## 2 2 Vision Liquor 1615 Macarthur Blvd Oakland CA 94602 p
## 3 3 Souza's Liquors 394 12th St Oakland CA 94607 p
## 4 4 Tk Liquors 1500 23th Ave Oakland CA 94606 p
## 5 5 Quadriga Wines Inc 6193 Ridgemont Dr Oakland CA 94619 p
## 6 6 Bev Mo 525 Embarcadero W Oakland CA 94607 c
Many online services want a single column with the address data, like "2625 Dana St, Berkeley CA, 94704"
address_data$full_address <- paste0(address_data$street, ", " ,
address_data$city, ", " ,
address_data$state, " ",
address_data$zip)
# Take a look
head(address_data, 3)
## id name street city state zip type
## 1 1 Wah Fay Liquors 2101 8th Ave Oakland CA 94606 p
## 2 2 Vision Liquor 1615 Macarthur Blvd Oakland CA 94602 p
## 3 3 Souza's Liquors 394 12th St Oakland CA 94607 p
## full_address
## 1 2101 8th Ave, Oakland, CA 94606
## 2 1615 Macarthur Blvd, Oakland, CA 94602
## 3 394 12th St, Oakland, CA 94607
google_geocoded <- geocode(address_data$full_address, output = "more",
source = "google", key=mykey)
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=2101%208th%20Ave%2C%20Oakland%2C%20CA%2094606&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Warning: package 'bindrcpp' was built under R version 3.4.4
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=1615%20Macarthur%20Blvd%2C%20Oakland%2C%20CA%2094602&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=394%2012th%20St%2C%20Oakland%2C%20CA%2094607&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=1500%2023th%20Ave%2C%20Oakland%2C%20CA%2094606&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=6193%20Ridgemont%20Dr%2C%20Oakland%2C%20CA%2094619&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=525%20Embarcadero%20W%2C%20%20Oakland%2C%20CA%2094607&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=5403%20Foothill%20Blvd%2C%20Oakland%2C%20CA%2094601&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=1200%2078th%20Ave%2C%20Oakland%2C%20CA%2094621&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=828%20Franklin%20St%2C%20Oakland%2C%20CA%2094607&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=5913%20International%20Blvd%2C%20Oakland%2C%20CA%2094621&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=3210%20Harrison%20St%2C%20Oakland%2C%20CA%2094611&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=1460%207th%20St%2C%20Oakland%2C%20CA%2094607&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=1333%20Peralta%20St%2C%20Oakland%2C%20CA%2094607&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=3710%20Telegraph%20Ave%2C%20Oakland%2C%20CA%2094609&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=3293%20Lakeshore%20Ave%2C%20Oakland%2C%20CA%2094610&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=1647%208th%20St%2C%20Oakland%2C%20CA%2094607&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=3849%20Martin%20Luther%20King%20Jr%20Way%2C%20Oakland%2C%20CA%2094609&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=3900%20Grand%20Ave%2C%20Oakland%2C%20CA%2094610&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=7305%20Edgewater%20Dr%20%23D%2C%20Oakland%2C%20CA%2094621&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=%20350%20E%2018th%20St%2C%20Oakland%2C%20CA%2094606&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
Google returns a lot of metadata - see documentation
head(google_geocoded)
## lon lat type loctype
## 1 -122.2449 37.79836 premise rooftop
## 2 -122.2237 37.80033 street_address rooftop
## 3 -122.2705 37.80262 street_address rooftop
## 4 -122.2350 37.78426 street_address rooftop
## 5 -122.1672 37.78434 premise rooftop
## 6 -122.2791 37.79590 street_address rooftop
## address north south east
## 1 2101 8th ave, oakland, ca 94606, usa 37.79972 37.79702 -122.2435
## 2 1615 macarthur blvd, oakland, ca 94602, usa 37.80168 37.79898 -122.2223
## 3 394 12th st, oakland, ca 94607, usa 37.80397 37.80127 -122.2692
## 4 1500 23rd ave, oakland, ca 94606, usa 37.78561 37.78291 -122.2336
## 5 6193 ridgemont dr, oakland, ca 94619, usa 37.78569 37.78299 -122.1658
## 6 525 embarcadero west, oakland, ca 94607, usa 37.79725 37.79455 -122.2778
## west street_number route locality
## 1 -122.2462 2101 8th Avenue Oakland
## 2 -122.2250 1615 MacArthur Boulevard Oakland
## 3 -122.2719 394 12th Street Oakland
## 4 -122.2363 1500 23rd Avenue Oakland
## 5 -122.1685 6193 Ridgemont Drive Oakland
## 6 -122.2805 525 Embarcadero West Oakland
## administrative_area_level_2 administrative_area_level_1 country
## 1 Alameda County California United States
## 2 Alameda County California United States
## 3 Alameda County California United States
## 4 Alameda County California United States
## 5 Alameda County California United States
## 6 Alameda County California United States
## postal_code postal_code_suffix neighborhood subpremise
## 1 94606 2007 <NA> <NA>
## 2 94602 1606 Glenview <NA>
## 3 94607 4249 Downtown Oakland <NA>
## 4 94606 5035 Rancho San Antonio <NA>
## 5 94619 3724 Caballo Hills <NA>
## 6 94607 3565 Downtown Oakland <NA>
# Add geocoded points to input data frame
# - works because num rows & row order returned by geocoder the same
address_data$glon <- google_geocoded$lon
address_data$glat <- google_geocoded$lat
head(address_data, 3) # take a look
## id name street city state zip type
## 1 1 Wah Fay Liquors 2101 8th Ave Oakland CA 94606 p
## 2 2 Vision Liquor 1615 Macarthur Blvd Oakland CA 94602 p
## 3 3 Souza's Liquors 394 12th St Oakland CA 94607 p
## full_address glon glat
## 1 2101 8th Ave, Oakland, CA 94606 -122.2449 37.79836
## 2 1615 Macarthur Blvd, Oakland, CA 94602 -122.2237 37.80033
## 3 394 12th St, Oakland, CA 94607 -122.2705 37.80262
library(leaflet)
point_map <- leaflet(address_data) %>%
addTiles() %>%
addMarkers(lat=~glat, lng=~glon,
popup=(paste0(address_data$name, "</br>",
address_data$full_address)
)
)
point_map
UC Berkeley has an ESRI Site licence
So we can use the World Geocoding Service for free.
This is a great value as geocoding is expensive!
In order to access the World Geocoding Service you need to first get an API key.
ESRI calls this an access token.
Instructions for this are on page 15 of this document.
Let’s try it!
You can use the token below for this workshop.
Tokens are temporary
2 hours.14 days even if you specify a longer expiration period.Your token should look something like this:
my_esri_token <- "iNFc0fq6bGlj6mUfptHgEGEjVeKZ0T5EPNAkVjgxBotwDaXxgykmwDNR0AkEvPi8Sh0QESWgt0IhmsbbO5UA3H-b2_T5yzhn1fW--WwD2O-7V2gbGCEaF4jg38dUQ7LhpTBT_XqYy0vV7Mw_e2hWUA.."
Two functions: geocode_one and geocode_many
These functions are long and ugly.
You can reference them by sourcing this script.
Take a look at the file on your computer
source("./scripts/esri_wgs_geocoding.R")
geocode_one("2625 Dana St, Berkeley, CA, 94704", my_esri_token,
postal = TRUE)
## Loading required package: httr
## lon lat score locName status
## 1 -122.2601 37.86202 100 World M
## matchAddr side addressType
## 1 2625 Dana St, Berkeley, California, 94704 R PointAddress
# Batch geocode your dataframe of addresses with the following function
esri_geocoded <- geocode_many(address_data$id, address_data$street,
address_data$city, address_data$state,
as.character(address_data$zip), my_esri_token)
## Loading required package: rjson
ESRI WGS also returns a lot of metadata
head(esri_geocoded, 3)
## ID lon lat score locName status
## 1 2 -122.2236 37.80054 100 World M
## 2 4 -122.2351 37.78423 100 World M
## 3 1 -122.2448 37.79827 100 World M
## matchAddr side addressType
## 1 1615 MacArthur Blvd, Oakland, California, 94602 L PointAddress
## 2 1500 23rd Ave, Oakland, California, 94606 R StreetAddress
## 3 2101 8th Ave, Oakland, California, 94606 L PointAddress
address_data <- merge(address_data, esri_geocoded[c("ID","lon","lat")],
by.x="id",by.y = "ID", all.x = T)
# Take a look
head(address_data, 3)
## id name street city state zip type
## 1 1 Wah Fay Liquors 2101 8th Ave Oakland CA 94606 p
## 2 2 Vision Liquor 1615 Macarthur Blvd Oakland CA 94602 p
## 3 3 Souza's Liquors 394 12th St Oakland CA 94607 p
## full_address glon glat lon
## 1 2101 8th Ave, Oakland, CA 94606 -122.2449 37.79836 -122.2448
## 2 1615 Macarthur Blvd, Oakland, CA 94602 -122.2237 37.80033 -122.2236
## 3 394 12th St, Oakland, CA 94607 -122.2705 37.80262 -122.2705
## lat
## 1 37.79827
## 2 37.80054
## 3 37.80238
Leafletpoint_map <- leaflet() %>%
addTiles() %>%
addMarkers(lat=address_data$glat, lng=address_data$glon,
popup=(paste0(address_data$name, "</br>",
address_data$full_address))
) %>%
addCircleMarkers(lat=address_data$lat, lng=address_data$lon,
color="black",fillColor="red",
popup=(paste0(address_data$name, "</br>",
address_data$full_address))
)
point_map
Good question!
You don’t want to re-geocode!
write.csv(google_geocoded,file="output/address_data_geocoded_google.csv",
row.names=FALSE)
write.csv(esri_geocoded,file="output/address_data_geocoded_esri.csv",
row.names=FALSE)
Map the results
| Use | For These |
|---|---|
| HWY | Highway |
| LN | Lane |
| DR | Drive |
| EXPY | Expressway |
Inaccurate feature attributes
Address changes
The US Census Bureau has been collecting data about the US population since 1790!
Survey data from the decennial census and the 5 year American Community Survey are two of the most commonly used databases.
Publicly available census data are aggregated to census geographies.
Census data are linked to Census geographies by IDs called FIPS Codes or GEOIDs
In order to link geocoded points to census data you need to get the census geographic FIPS codes for the points.
There are a number of ways to do this.
Census geographies can change every 10 years due to changes in the population.
For example,
if the population of an area increases, census tracts can split
if the population of an area decreases, tracts can merge
This makes comparisons between censuses more complicated.
If you geocode with the US Census Geocoding Service you can get the census IDs from the geocoder.
Try that with “2625 Dana St, Berkeley, CA, 94704”
Provides an programming interface for fetching the 2010 Block FIPS code
https://geo.fcc.gov/api/census/#!/block/get_block_find
https://geo.fcc.gov/api/census/block/find?latitude=37.85256200000
&longitude=-122.2736340&showall=true&format=json>
Source the script file to load the function
latlon2fips
Then try it with -122.273634, 37.852562
source("./scripts/fcc_latlon2fips.R")
# test one coordinate pair
latlon2fips(latitude=37.852562, longitude=-122.273634)
## [1] "060014240011002"
Apply the latlon2fips function to all rows in our dataframe
Use the coordinates returned by the Google Geocoding API
address_data$fips<- mapply(latlon2fips, address_data$glat,
address_data$glon)
# Take a look
head(address_data, 3)
## id name street city state zip type
## 1 1 Wah Fay Liquors 2101 8th Ave Oakland CA 94606 p
## 2 2 Vision Liquor 1615 Macarthur Blvd Oakland CA 94602 p
## 3 3 Souza's Liquors 394 12th St Oakland CA 94607 p
## full_address glon glat lon
## 1 2101 8th Ave, Oakland, CA 94606 -122.2449 37.79836 -122.2448
## 2 1615 Macarthur Blvd, Oakland, CA 94602 -122.2237 37.80033 -122.2236
## 3 394 12th St, Oakland, CA 94607 -122.2705 37.80262 -122.2705
## lat fips
## 1 37.79827 060014055003004
## 2 37.80054 060014049003026
## 3 37.80238 060014030001005
We can download geographic data from the Census
and then spatially join the geocoded points to the census geographic data
to get the FIPS codes for each point
We can use the tigris package to download census geographic data.
library(sp)
## Warning: package 'sp' was built under R version 3.4.3
library(tigris)
## To enable
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
##
## Attaching package: 'tigris'
## The following object is masked from 'package:graphics':
##
## plot
tigrisSet tigris options & download data
options(tigris_class = "sp") # options are sp or sf
options(tigris_use_cache = F) # set to true to save locally
tracts2010 <- tracts(state = '06', county= '001', cb = F, year=2010)
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 7%
|
|===== | 8%
|
|====== | 9%
|
|====== | 10%
|
|======= | 10%
|
|======= | 11%
|
|======= | 12%
|
|======== | 12%
|
|======== | 13%
|
|========= | 13%
|
|========= | 14%
|
|========= | 15%
|
|========== | 15%
|
|========== | 16%
|
|=========== | 16%
|
|=========== | 17%
|
|=========== | 18%
|
|============ | 18%
|
|============ | 19%
|
|============= | 19%
|
|============= | 20%
|
|============= | 21%
|
|============== | 21%
|
|============== | 22%
|
|=============== | 22%
|
|=============== | 23%
|
|=============== | 24%
|
|================ | 24%
|
|================ | 25%
|
|================= | 25%
|
|================= | 26%
|
|================= | 27%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 28%
|
|=================== | 29%
|
|=================== | 30%
|
|==================== | 30%
|
|==================== | 31%
|
|==================== | 32%
|
|===================== | 32%
|
|===================== | 33%
|
|====================== | 33%
|
|====================== | 34%
|
|====================== | 35%
|
|======================= | 35%
|
|======================= | 36%
|
|======================== | 36%
|
|======================== | 37%
|
|======================== | 38%
|
|========================= | 38%
|
|========================= | 39%
|
|========================== | 39%
|
|========================== | 40%
|
|========================== | 41%
|
|=========================== | 41%
|
|=========================== | 42%
|
|============================ | 42%
|
|============================ | 43%
|
|============================ | 44%
|
|============================= | 44%
|
|============================= | 45%
|
|============================== | 45%
|
|============================== | 46%
|
|============================== | 47%
|
|=============================== | 47%
|
|=============================== | 48%
|
|================================ | 48%
|
|================================ | 49%
|
|================================ | 50%
|
|================================= | 50%
|
|================================= | 51%
|
|================================= | 52%
|
|================================== | 52%
|
|================================== | 53%
|
|=================================== | 53%
|
|=================================== | 54%
|
|=================================== | 55%
|
|==================================== | 55%
|
|==================================== | 56%
|
|===================================== | 56%
|
|===================================== | 57%
|
|===================================== | 58%
|
|====================================== | 58%
|
|====================================== | 59%
|
|======================================= | 59%
|
|======================================= | 60%
|
|======================================= | 61%
|
|======================================== | 61%
|
|======================================== | 62%
|
|========================================= | 62%
|
|========================================= | 63%
|
|========================================= | 64%
|
|========================================== | 64%
|
|========================================== | 65%
|
|=========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 67%
|
|============================================ | 68%
|
|============================================= | 69%
|
|============================================= | 70%
|
|============================================== | 70%
|
|============================================== | 71%
|
|============================================== | 72%
|
|=============================================== | 72%
|
|=============================================== | 73%
|
|================================================ | 73%
|
|================================================ | 74%
|
|================================================ | 75%
|
|================================================= | 75%
|
|================================================= | 76%
|
|================================================== | 76%
|
|================================================== | 77%
|
|================================================== | 78%
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 79%
|
|==================================================== | 80%
|
|==================================================== | 81%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 82%
|
|====================================================== | 83%
|
|====================================================== | 84%
|
|======================================================= | 84%
|
|======================================================= | 85%
|
|======================================================== | 85%
|
|======================================================== | 86%
|
|======================================================== | 87%
|
|========================================================= | 87%
|
|========================================================= | 88%
|
|========================================================== | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 90%
|
|=========================================================== | 91%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 93%
|
|============================================================= | 94%
|
|============================================================= | 95%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|=============================================================== | 96%
|
|=============================================================== | 97%
|
|=============================================================== | 98%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 99%
|
|=================================================================| 100%
tracts2000 <- tracts(state = '06', county= '001', cb = F, year=2000)
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 7%
|
|===== | 8%
|
|====== | 9%
|
|====== | 10%
|
|======= | 10%
|
|======= | 11%
|
|======= | 12%
|
|======== | 12%
|
|======== | 13%
|
|========= | 13%
|
|========= | 14%
|
|========= | 15%
|
|========== | 15%
|
|========== | 16%
|
|=========== | 16%
|
|=========== | 17%
|
|=========== | 18%
|
|============ | 18%
|
|============ | 19%
|
|============= | 19%
|
|============= | 20%
|
|============= | 21%
|
|============== | 21%
|
|============== | 22%
|
|=============== | 22%
|
|=============== | 23%
|
|=============== | 24%
|
|================ | 24%
|
|================ | 25%
|
|================= | 25%
|
|================= | 26%
|
|================= | 27%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 28%
|
|=================== | 29%
|
|=================== | 30%
|
|==================== | 30%
|
|==================== | 31%
|
|==================== | 32%
|
|===================== | 32%
|
|===================== | 33%
|
|====================== | 33%
|
|====================== | 34%
|
|====================== | 35%
|
|======================= | 35%
|
|======================= | 36%
|
|======================== | 36%
|
|======================== | 37%
|
|======================== | 38%
|
|========================= | 38%
|
|========================= | 39%
|
|========================== | 39%
|
|========================== | 40%
|
|========================== | 41%
|
|=========================== | 41%
|
|=========================== | 42%
|
|============================ | 42%
|
|============================ | 43%
|
|============================ | 44%
|
|============================= | 44%
|
|============================= | 45%
|
|============================== | 45%
|
|============================== | 46%
|
|============================== | 47%
|
|=============================== | 47%
|
|=============================== | 48%
|
|================================ | 49%
|
|================================ | 50%
|
|================================= | 50%
|
|================================= | 51%
|
|================================= | 52%
|
|================================== | 52%
|
|================================== | 53%
|
|=================================== | 53%
|
|=================================== | 54%
|
|=================================== | 55%
|
|==================================== | 55%
|
|==================================== | 56%
|
|===================================== | 56%
|
|===================================== | 57%
|
|===================================== | 58%
|
|====================================== | 58%
|
|====================================== | 59%
|
|======================================= | 59%
|
|======================================= | 60%
|
|======================================= | 61%
|
|======================================== | 61%
|
|======================================== | 62%
|
|========================================= | 62%
|
|========================================= | 63%
|
|========================================= | 64%
|
|========================================== | 64%
|
|========================================== | 65%
|
|=========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 67%
|
|============================================ | 68%
|
|============================================= | 68%
|
|============================================= | 69%
|
|============================================= | 70%
|
|============================================== | 70%
|
|============================================== | 71%
|
|============================================== | 72%
|
|=============================================== | 72%
|
|=============================================== | 73%
|
|================================================ | 73%
|
|================================================ | 74%
|
|================================================ | 75%
|
|================================================= | 75%
|
|================================================= | 76%
|
|================================================== | 76%
|
|================================================== | 77%
|
|================================================== | 78%
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 79%
|
|==================================================== | 80%
|
|==================================================== | 81%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 82%
|
|====================================================== | 83%
|
|====================================================== | 84%
|
|======================================================= | 84%
|
|======================================================= | 85%
|
|======================================================== | 85%
|
|======================================================== | 86%
|
|======================================================== | 87%
|
|========================================================= | 87%
|
|========================================================= | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 90%
|
|=========================================================== | 91%
|
|=========================================================== | 92%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 93%
|
|============================================================= | 94%
|
|============================================================= | 95%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|=============================================================== | 96%
|
|=============================================================== | 97%
|
|=============================================================== | 98%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 99%
|
|=================================================================| 100%
plot(tracts2000, border="red")
plot(tracts2010, add=T)
Use the sp library to convert the address_data to a SpatialPointsDataFrame
Then, set the coordinate reference system, or CRS to be the same as that of the census tracts.
address_data_sp<- address_data #make copy
coordinates(address_data_sp) <-c("glon", "glat")
proj4string(address_data_sp) <- CRS(proj4string(tracts2010))
The Google Geocoding API returns geographic coordinates referenced to the World Geodetic System of 1984 or WGS84.
Census geographic data are referenced to the North American Datum of 1983 or NAD83.
These two geographic CRSs can differ by up to 1 meter in the continental USA.
It’s up to you to decide if this could impact your spatial analysis.
plot(tracts2000, border="red")
plot(tracts2010, add=T)
points(address_data_sp, col="green")
The sp::over function spatially compares the two input spatial objects and returns the data values for the census tract in which each point is located.
fips2000 <- over(address_data_sp, tracts2000)
fips2010 <-over(address_data_sp, tracts2010)
# Uncomment to take a look at the output
#head(fips2000, 3)
#head(fips2010, 3)
address_data$GEOID10 <- fips2010$GEOID10
address_data$CTIDFP00 <- fips2000$CTIDFP00
# head(address_data)
Discuss the 3 methods for getting the census FIPS codes for the geocoded points.
When geocoding
Using an API
Using spatial overlay
We can take this one step further and retrieve census data for the census tracts of interest.
We do this with the tidycensus package
library(tidycensus)
## Warning: package 'tidycensus' was built under R version 3.4.3
# Load census key - Patty's key, apply for your own!
my_census_api_key <- "f2d6f4f743545d3a42a67412b05935dc7712c432"
# Make tigris aware of census key
census_api_key(my_census_api_key)
## To install your API key for use in future sessions, run this function with `install = TRUE`.
my_states<- c("06") # CA
my_counties <- c("001") # Alameda County
Our scenario of interest is to identify liquor stores in census tracts with a relatively high population of minors.
So, we want to find the census variables for total population and population under 18
cenvar_table <-load_variables(year=2016, dataset = "acs5", cache=T)
# Take a look at the data frame to find the names of census tables
# This will give you the table and variable codes
# Use "View(cenvar_table)" and filter within the table to get the codes for a specific variable
# View(cenvar_table)
Indentify the tables and variables of interest in the 5 year ACS Data (2012-2016)
pop_total <- "B01001_001E"
pop_under18 <- "B09001_001E"
pop_acs5_2016 <-get_acs(geography = "tract",
variables = c(pop_total,pop_under18),
year=2016, survey="acs5",
state = my_states, county = my_counties,
geometry = F)
## Please note: `get_acs()` now defaults to a year or endyear of 2016.
#head(pop_acs5_2016)
Load libraries
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.4.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.4
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Select the columnbs of interest
# and put `totpop` and `under18` in their own columns
pop2 <- pop_acs5_2016 %>%
select("GEOID","variable","estimate") %>%
spread(key=variable, value=estimate)
# Rename columns
colnames(pop2)<-c("GEOID10","totpop","under18")
head(pop2)
## # A tibble: 6 x 3
## GEOID10 totpop under18
## <chr> <dbl> <dbl>
## 1 06001400100 3018. 387.
## 2 06001400200 1960. 335.
## 3 06001400300 5236. 922.
## 4 06001400400 4171. 655.
## 5 06001400500 3748. 500.
## 6 06001400600 1661. 233.
pop2$pct_under18 <- round((pop2$under18 / pop2$totpop) * 100, 1)
address_data2 <- merge(address_data, pop2, by="GEOID10", all.x=T)
# Take a look - what do you think?
#head(address_data2)
We just joined the census data to the address points.
We can also join it to the census tracts.
tracts2010 <- merge(tracts2010, pop2, by="GEOID10")
# Take a look:
# head(tracts2010@data)
Note: this is sp::merge so we do not reference the @data slot!
quantColors <- colorQuantile("Reds", tracts2010$pct_under18, n=5)
point_map <- leaflet() %>%
addTiles() %>%
addPolygons(data=tracts2010,
color="white",
weight=1,
opacity=0.5,
fillColor= ~quantColors(pct_under18),
fillOpacity = 0.75,
popup = paste0("<b>Percent under 18:</b> ", tracts2010$pct_under18, "%")) %>%
addMarkers(data=address_data, lat=~glat, lng=~glon,
popup=(paste0(address_data$name, "</br>",
address_data$full_address)
)
)
point_map
write.csv(address_data2, file="output/address_data_geocoded2.csv", row.names=F)
#writeOGR(tracts2010, dsn="./output", layer="tracts2010", driver="ESRI Shapefile")
Review the documents and websites linked in this tutorial
Sign up for a consult!